[core] Support push down branchesTable by branchName#6231
Conversation
13c00c9 to
b087973
Compare
|
@JingsongLi Can take some time to help me review this PR? Thank you so much~ |
Thanks @huyuanfeng2018 for the contribution. Do your scenes have many branches? I rarely see situations where there are many branches, so I haven't optimized it here either. |
When using cdc synchronization, we generate paimon branch every hour accurately split hourly snapshots (tags can't do that), so there will be many branch |
Got it, I will take a look~ |
* upstream/master: (23 commits) [flink] Use Paimon format table read for flink (apache#6246) [core] Ensure system tables use the correct identifier for loadTableToken in RESTTokenFileIO. (apache#6247) [spark] Enhance v1 write merge schema test coverage (apache#6249) [spark] Eliminate duplicate convertLiteral invocations (apache#6250) [python] Fix failing to read 1000cols (apache#6244) [python] Expose CatalogFactory and Schema directly (apache#6243) [doc] Modify Python API to JVM free (apache#6242) [python] Fix multiple write brefore once commit (apache#6241) [core] Support push down branchesTable by branchName (apache#6231) [cdc] Fix PostgreSQL DECIMAL type conversion issue (apache#6239) [arrow] Optimize Arrow string write performance (apache#6240) [core] Fix checkpoint recovery failure for compacted changelog files (apache#6173) [core] RESTCatalog: add DLF OSS endpoint support and improve configuration merge (apache#6232) [core] fix RESTCatalog#listViews for system database (apache#6233) [core] Introduce 'ignore-update-before' to ignore UD only (apache#6235) [python] Fix DLF partition statistical error (apache#6237) [python] Add _VALUE_STATS_COLS param to fix parse wrong bytes (apache#6234) [ci] Rename to Python Check Code Style and Test [python] Rename binary row to generic row [hotfix] Remove methods in SchemaManager for SchemasTable ...
Purpose
Linked issue: close #xxx
In Paimon's system table $branches, when a user needs to query specific branch information, the previous implementation did not support pushing down the filtering criteria of branch_name.
This means that even if we only want to query one or a few specific branches, Paimon will read all branch information and then filter it at the computation engine layer
This change aims to optimize the query performance of the $branches table by implementing the pushdown of filtering conditions for the branch_name field, allowing queries to be filtered directly at the storage layer based on the branch name, thereby significantly improving query efficiency.
Tests
org.apache.paimon.flink.BranchSqlITCase##testBranchesTableFilter with adding more case for the method.
API and Format
Documentation